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Abstract 

We  study  bootstrap  confidence  intervals  for  three  types  of  parameters  in  Cox’s 
proportional  hazards  model:  the  regression  parameter,  the  survival  function  at  fixed 
time  points,  and  the  median  survival  time  at  fixed  values  of  a  covariate.  Several 
types  of  bootstrap  confidence  intervals  are  studied,  and  the  type  of  intervad  is  de¬ 
termined  by  two  factors.  One  factor  is  the  method  of  drawing  the  bootstrap  sam¬ 
ple.  We  consider  three  such  methods,  which  may  be  briefly  described  as  follows: 

(1)  Ordinary  resampling  from  the  empirical  cumulative  distribution  function,  (2) 
Resampling  conditional  on  the  covariates,  and  (3)  Resampling  conditional  on  the 
covariates  and  the  censoring  pattern.  Another  factor  is  the  method  of  forming  the 
confidence  interval  from  a  gi .  ?n  sample;  the  methods  considered  are  the  percentile, 
hybrid,  and  bootstrap-<.  We  provide  a  theorem  on  the  asymptotic  validity  of  the 
third  method  of  bootstrap  resampling.  All  the  methods  of  forming  confidence  inter¬ 
vals  are  compared  to  each  other  and  to  the  standard  asymptotic  method  via  a  Monte 
Carlo  study.  The  data  sets  for  this  Monte  Carlo  study  are  simulated  conditionally 
on  the  covariates  and  the  censoring  pattern,  the  situation  appropriate  for  the  third 
method  of  resampling.  One  conclusion  drawn  from  the  Monte  Carlo  study  is  that 
the  asymptotic  method  is  best  for  the  regression  parameter,  but  not  for  the  survival 
function  or  the  median  survival  time.  Conclusions  about  the  bootstrap  methods  in¬ 
clude  the  surprising  result  that,  overall,  the  second  method  of  drawing  the  samples 
outperforms  the  third  method.  Also,  there  is  an  interaction  effect  between  the  two 
factors,  method  of  drawing  the  sample  and  method  of  forming  the  interval,  espe¬ 
cially  for  estimation  of  the  regression  parameter.  Finally,  the  bootstrap-t  intervals 
are  consistently  outperformed  by  at  least  one  of  the  two  more  rudimentary  types  of 
bootstrap  interval. 

Key  words  and  phrases:  ancillarity  principle,  bootstrap-f,  hybrid  interval,  percentile  in¬ 
terval,  proportional  hazards  model 


1  Introduction  and  Summary 

The  proportional  hazards  model  of  Cox  (1972)  specifies  that  the  hcizard  rate  for  an  indi¬ 
vidual  with  covariate  vector  x  is 

A(t|x)  =  A(<)exp(^oz) 

where  is  a  vector  of  unknown  regression  coefficients  and  A,  the  underlying  baseline 
hazard  rate,  is  an  unknown  and  unspecified  nonnegative  function.  Several  parameters  of 
common  interest  in  the  Cox  model  are  the  regression  parameter  the  survival  function 
5(t|x)  associated  with  A(t|x),  and  the  quantile  of  the  distribution  of  the  lifelength  of 
an  individual  with  covariate  vector  z,  ^p(z).  The  role  of  ^1/2(2;)  is  analogous  to  the  role 
of  the  mean  response,  or  the  regression  curve  at  the  point  z,  in  linear  regression  analysis. 

For  estimation  of  ^0  and  5(t|z),  there  exists  a  well- developed  asymptotic  theory  which 
enables  the  construction  of  confidence  intervals  (Andersen  and  Gill  (1982)).  These  con¬ 
fidence  intervals  have  been  found  to  work  well  in  practice,  and  they  are  available  in 
standard  statistical  computer  packages,  such  as  SAS  and  S.  For  estimation  of  ^p(z),  the 
results  of  Andersen  and  Gill  (1982)  must  be  applied  in  conjunction  with  a  result  on  weak 
convergence  of  the  quantile  process  to  derive  asymptotic  theory  (Dabrowska  and  Doksum 
(1987),  Burr  and  Doss  (1991)).  The  expression  for  the  asymptotic  standard  error  of  the 
estimator  involves  the  hazard  rate  function  A(^p(z)),  just  as  the  formula  for  asymptotic 
standard  error  of  the  median  of  a  simple  random  sample  involves  the  underlying  density 
function.  Estimation  of  density  functions  or  hazard  rate  functions  is  a  complicated  en¬ 
deavor;  Silverman  (1986)  discusses  many  methods  which  have  been  studied.  Because  the 
asymptotic  theory  for  ^  timation  of  ^p(z)  is  only  recently  available  and  requires  estimation 
of  the  hazard  rate,  it  is  not  widely  applied. 

This  paper  concerns  the  bootstrap  method  of  forming  confidence  intervals  in  the  Cox 
model.  A  principal  reason  for  this  study  is  that  the  bootstrap  is  an  alternative  to  methods 
based  on  standard  asymptotic  theory.  Before  discussing  further  our  motivation,  it  is  useful 
to  describe  three  distinct  methods  of  bootstrapping  in  the  Cox  model. 

The  data  and  model  may  be  described  as  follows.  Associated  with  individual  i  are 
a  covariate  vector  A',,  a  lifelength  Y,,  and  a  censoring  time  C,.  We  do  not  observe 
Yi  directly,  but  rather  we  observe  Ti  =  min(yi,Ci)  and  6,  =  I{Yi  <  C,).  Thus  the 
data  is  (T,, 6i,X,),  i  =  l,...,n.  The  underlying  baseline  cumulative  hazard  function  is 
A(t)  =  /o  A(s)ds.  Then,  the  distribution  function  of  the  lifetime  of  an  individual  with  co¬ 
variate  z  is  F{t\x)  =  1  —  nu<<(l  ~  A(du))'*P^^o*^  (See  Section  2.1.)  If  /?  and  A  are  Cox’s 
(1972)  and  Breslow’s  (1972,  1974)  estimates  of  0o  and  A,  respectively,  then  F(t|z)  may 
be  estimated  by  F(t|z)  =  1  —  nu<t(l  ~  A(du))'*P^^'*^  Assume  that  the  Cj’s  are  iid  ~  G, 
and  let  G  be  the  Kaplan-Meier  estimate  of  G  based  on  the  data  (T^,  6,,  X,),  i  =  1, . . . ,  n. 

Consider  the  following  two  methods  of  bootstrapping: 

Method  1:  Resample  the  triples  (T,, t‘'i,X,),  i  =  l,...,n. 

Method  2:  Generate  Y*  ~  .^(flA"”,)  and  G*  ~  G,  i  =  l,...,n,  all  variables  indepen¬ 
dent.  Form  T*  =  min(>'^*.G*l  and  S*  =  I{Y’  <  G*).  The  resampled  data  is  then 
i  =  l,...,n. 

Efron  and  Tibshirani  (1986)  discuss  Method  1.  In  an  interesting  but  unpublished 
technical  report  which  provided  impetus  for  the  present  work,  Hjort  (1985)  proposes 
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Method  2  and  develops  some  asymptotic  theory  for  it.  Suppose  that  we  wish  to  estimate 
the  variability  of  some  estimate,  such  as  (p{x).  Method  1  is  appropriate  for  estin.;'ting 
the  unconditional  variance  of  ip{x),  i.e.  averaging  over  the  marginal  distribution  of  the 
covariates  and  of  the  censoring  variables.  Method  2  is  appropriate  for  estimating  the 
conditional  variance  of  ^p(a:)  given  X,  where  X  =  (Xi,...,X„).  If  the  distribution  of 
the  Xi's  does  not  depend  on  the  unknown  parameters  S  and  /3o  then  the  usual  ancillarity 
arguments  point  to  the  conditional  variance  as  the  “right”  quantity  to  estimate.  This 
situation  is  closely  connected  to  bootstrapping  in  linear  regression  models,  where  one 
can  bootstrap  by  resampling  from  the  pairs  (responses,  covariates),  or  one  can  bootstrap 
by  resampling  from  the  residuals;  see  Freedman  (1981).  Many  of  the  comments  in  the 
discussion  paper  Wu  (1986)  are  relevant  here. 

The  ancillarity  principle  can  be  carried  further  in  the  presence  of  censoring,  where  the 
censoring  pattern  is  an  ancillary  statistic.  The  distribution  of  the  C/s  does  not  depend  on 
the  unknown  parameters  S  and  0o\  therefore,  if  we  knew  the  C,’s  we  would  condition  on 
them.  However,  we  don’t  see  the  C0s  exactly:  If  6,  =  0  we  see  the  exact  value  of  Cj,  but  if 
=  1  we  know  only  that  C,  >  T^.  Denote  this  information  on  the  C’s  by  C.  Then,  what 
we  want  to  estimate  is  Var(^g(x)|vV,C).  This  idea  leads  to  Method  3  of  bootstrapping. 
Method  3:  Generate  Y’  ~  F(t|Xi).  \{  8i  =  0,  let  C*  =  T,;  if  <5,  =  1,  generate  C,  from 
the  Kaplan-Meier  estimate  of  G,  conditional  on  C,  >  T,,  that  is,  from  the  distribution 

(G(0-G(t;))/(i-g(t.)). 

In  deciding  to  study  Method  3  of  taking  bootstrap  samples,  we  were  motivated  by 
consideration  of  a  situation  where  this  method  appears  to  provide  a  large  improvement 
over  Method  2.  Suppose  0q  is  positive,  so  that  large  covariate  values  are  associated 
with  large  risk.  Then  a  data  point  with  a  large  covariate  value,  for  which  the  lifetime  is 
large  and  uncensored,  would  be  an  influential  point.  The  estimator  0  is  pulled  heavily 
downward  by  such  an  influential  point.  For  estimation  of  the  bias  of  0  conditional  on 
the  covariate/censoring  pattern,  it  appears  intuitively  true  that  Bootstrap  Method  3  is 
better  than  Method  2.  The  large  lifetime  will  usually  be  censored  in  Method  2  bootstrap 
samples,  and  in  effect  these  bootstrap  samples  will  be  very  similar  to  those  arising  if  the 
influential  point  did  not  exist.  In  contrast.  Method  3  forces  the  large  lifetime  to  remain 
uncensored,  and  so  this  problem  does  not  etrise. 

In  the  preliminary  study  that  motivated  our  consideration  of  Method  3,  when  we 
compared  the  bootstrap  estimates  of  bias  of  0  obtained  from  the  three  methods  of  taking 
bootstrap  samples.  Method  3  had  the  lowest  m.a.d.  (median  absolute  deviation  from  the 
true  value)  of  the  three  methods.  Details  of  an  expanded  version  of  this  study  are  given 
in  Section  3. 

We  can  now  discuss  more  fully  the  motivations  for  the  present  work.  First,  although 
asymptotic  methods  already  exist  for  the  main  interesting  parameters,  in  the  case  of  ip{x) 
they  are  difficult  to  apply.  Burr  and  Doss  (1991)  apply  the  aisymptotic  theory  to  formation 
of  confidence  bands  for  (p{x)  and  show  good  performance  of  the  bands  in  some  Monte 
Carlo  studies.  They  use  a  kernel  function  method  to  estimate  A(^p(x)),  for  which  the 
bin  width  was  carefully  chosen  to  be  suitable  for  the  particular  simulation  study.  They 
caution  that  if  it  is  not  possible  to  exert  such  care  in  selecting  the  bin  width  in  practice, 
then  the  performance  of  the  bands  may  be  adversely  affected.  In  addition,  even  when 
the  asymptotics  are  easily  implemented,  the  bootstrap  may  outperform  the  asymptotic 
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methods.  Singh  (1981)  and  Abramovitch  and  Singh  (1985)  give  situations  where  the  sam¬ 
pling  variability  of  an  estimator  is  more  accurately  estimated  using  bootstrap  procedures 
than  using  standard  asymptotic  methods. 

A  second  reason  for  this  investigation  on  bootstrap  confidence  intervals  in  the  Cox 
model  is  that  “the  bootstrap”  method  of  forming  confidence  intervals  is  not  in  fact 
uniquely  defined.  There  are  two  aspects  to  this  problem:  First,  it  is  often  true  that 
when  dealing  with  complex  data  structures  there  are  several  ways  to  draw  the  bootstrap 
samples;  second,  many  methods  for  forming  bootstrap  confidence  intervals  from  a  given 
sample  have  been  proposed.  A  major  thrust  of  the  present  work  is  to  study  the  effects  of 
method  of  taking  the  sample  and  method  of  forming  the  interval,  on  performance  of  the 
bootstrap  in  interval  estimation  of  the  parameters  of  most  interest  in  the  Cox  model. 

So  many  types  of  bootstrap  confidence  intervals  have  been  proposed  that  the  variety 
available  can  be  overwhelming.  Types  which  are  frequently  studied  in  the  literature 
include  the  percentile,  hybrid,  bootstrap-f,  and  Efron’s  BCai  all  of  which  are  discussed  by 
Martin  (1990).  Here  we  study  only  three  kinds  of  bootstrap  confidence  intervals  in  order 
to  keep  the  work  focused,  and  to  simplify  the  conclusions  drawn.  We  study  the  percentile, 
hybrid,  and  bootstrap-f  intervals.  These  intervals  are  described  in  Section  2.2  below. 

The  percentile  method  is  considered  because  even  though  the  theoretical  justification 
for  this  method  is  the  weakest  (Hall  (1988)),  these  intervals  are  the  simplest  to  use  and  ex¬ 
plain,  and  are  the  most  frequently  used  in  practice.  We  study  the  hybrid  method  because 
this  is  precisely  the  method  which  is  justified  by  asymptotic  results  for  the  bootstrap  in 
complicated  models,  such  as  the  Cox  model.  That  is,  let  6  represent  a  parameter  to  be 
estimated,  6  an  estimate  of  it,  and  6’  the  estimate  computed  from  a  bootstrap  sample. 
Suppose  we  know  that  y/n{0'  —  0)  has  almost  surely  the  same  limiting  distribution  tis 
y/n[9  —  9).  Then  we  would  want  to  use  the  distribution  of  [9’  —  9)  to  approximate  that 
of  {9  —  9)-,  if  we  do  this  to  form  confidence  intervals  for  9,  then  the  intervals  obtained 
are  the  hybrid  ones.  The  bootstrap-t  and  the  BCa  intervals  are  comparable  in  that  both 
have  been  demonstrated  theoretically  to  be  “second-order  correct”  for  one-sided  intervals 
in  some  relatively  simple  situations;  see  Hall  (1988).  The  bootstrap-t  stood  out  as  a 
star  performer  in  recent  empirical  work  of  Owen  (1988),  which  dealt  with  nonparametric 
interval  estimation  of  the  mean  from  a  random  sample.  In  addition,  the  bootstrap-^  is 
usually  more  automatic  to  apply  than  the  BCa  method.  For  these  reasons,  we  study  the 
bootstrap-t  rather  than  the  BCa  method  in  this  work.  However,  we  must  also  note  here  a 
problem  with  the  bootstrap-^  which  was  especially  apparent  in  our  work  on  applying  it  to 
forming  confidence  intervals  for  ^p(x).  The  recent  work  which  has  shown  good  behavior 
of  the  bootstrap-^  has  dealt  with  simple  cases,  such  eis  estimating  the  mean  of  a  random 
sample.  It  is  well-known  that  in  more  complicated  situations,  stability  of  the  estimate  of 
scale  needed  for  the  bootstrap-^  is  crucial  to  its  good  performance.  It  may  be  very  difficult 
to  estimate  this  scale  parameter,  and  in  fact,  the  bootstrap-t  method  is  not  uniquely  de¬ 
fined  since  many  different  scale  parameters,  and  different  estimators  of  these  parameters, 
could  be  tried.  In  this  work,  we  have  taken  the  approach  of  building  on  known  asymptotic 
results  in  order  to  lessen  the  computational  labor  of  the  bootstrap-^;  that  is,  we  estimate 
the  standard  error  of  our  estimators  by  formulas  derived  from  asymptotic  theory.  This 
approach  did  not  work  for  ^1/2(3^)-  See  Section  2.2  for  a  discussion  of  the  bootstrap-t  for 
estimation  of  ^p(x),  and  for  detailed  descriptions  of  the  three  methods  we  study. 
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Asymptotic  results  for  estimators  in  the  Cox  model  are  based  on  martingale  central- 
limit  theorems.  It  is  an  unfortunate  feature  of  this  martingale  theory  that  it  is  capable  of 
producing  only  first-order  results.  That  is,  there  are  no  tools  available  which  are  analogous 
to  the  Edgeworth  expansions  that  yield  the  currently  available  theoretical  comparisons  of 
the  various  types  of  intervals.  When  faced  with  the  impossibility  of  obtaining  a  theoretical 
comparison  of  the  various  bootstrap  methods  for  forming  confidence  intervals  using  the 
existing  theory,  comparisons  must  be  done  by  Monte  Carlo  studies.  Here,  the  Monte 
Carlo  studies  are  carried  out  in  the  situation  appropriate  for  Method  3  of  bootstrapping, 
that  is,  conditional  on  the  covariates  and  the  censoring  pattern;  this  is  explained  in  more 
detail  in  Section  3.1. 

There  is  an  enormous  number  of  possible  Monte  Carlo  studies  that  could  be  carried 
out,  particularly  since  we  condition  on  the  covariates  and  the  censoring  pattern.  We 
have  carried  out  simulations  for  a  wide  range  of  the  factors  involved.  In  choosing  which 
studies  to  present  here,  the  decision  was  complicated  by  inconsistencies  in  the  results 
among  the  different  situations  we  studied.  In  particular,  there  were  marked  changes  in 
the  relative  performances  of  the  different  methods  for  low  and  high  percent  censoring. 
We  present  in  detail  in  this  paper  the  results  for  a  situation  with  high  percent  censoring 
(55%),  over  several  sample  sizes  and  several  covariate/censoring  patterns.  We  selected 
the  high  percent  censoring  case  because  it  is  our  experience  that  many  applications  of  the 
Cox  model  involve  high  percent  censoring  due  to  a  fixed  endpoint  of  the  clinical  trial.  We 
also  report  results  of  a  single  study  with  an  influential  point  of  the  sort  described  above 
in  our  motivation  for  Method  3  of  bootstrapping. 

Before  carrying  out  the  Monte  Carlo  studies,  we  had  anticipated  that  confidence  in¬ 
tervals  would  improve  with  increasing  sophistication  of  the  method  of  drawing  bootstrap 
samples  and  method  of  forming  the  intervals.  However,  this  was  not  the  case.  Even  in 
the  restricted  set  of  studies  we  report  on  here,  the  results  are  not  simple  to  describe;  there 
is  no  single  “winner”  among  the  bootstrap  methods.  To  begin  with,  different  conclusions 
are  drawn  for  each  type  of  parameter  studied,  •?(•)»  and  For  forming  confidence 

intervals  for  /?o,  the  asymptotic  method  is  consistently  better  than,  or  at  least  cis  good  as, 
all  the  bootstrap  methods  considered.  This  is  not  true  for  the  confidence  intervals  for  5(-) 
or  ^p(-).  Nevertheless,  we  venture  to  make  the  following  general  statements  here.  One 
surprising  result  is  that  Method  2  overall  outperforms  Method  3,  except  in  the  study  with 
the  influential  point.  Also,  there  is  an  interaction  effect  between  the  two  factors,  method 
of  drawing  the  sample  and  method  of  forming  the  interval,  especially  for  estimation  of 
Finally,  the  bootstrap-^  intervals  are  consistently  outperformed  by  at  least  one  of  the  two 
more  rudimentary  bootstrap  methods. 

In  Section  5  we  deal  with  asymptotics,  and  we  consider  the  simpler  case  where  /?o  is 
known  to  be  zero.  This  corresponds  to  the  familiar  setup  of  the  Kaplan-Meier  estimator. 
In  the  case  of  the  Kaplan-Meier  estimator.  Methods  1  and  2  of  bootstrapping  are  identical 
(Efron  (1981)).  It  has  been  shown  for  Method  1  (or  Method  2)  that  the  Kaplan-Meier 
estimator  computed  from  the  bootstrap  sample,  when  standardized,  converges  weakly 
to  the  same  Gaussian  process  to  which  the  standardized  Kaplan-Meier  estimator  itself 
converges  (Akritas  (1986);  Lo  and  Singh  (1986)).  We  show  the  same  result  for  Method  3. 
Thus  Methods  1  and  3  may  be  regarded  cis  “asymptotically  equivalent.” 

The  rest  of  the  paper  is  organized  as  follows:  Section  2  gives  a  detailed  description  of 
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the  notation  and  algorithms  needed  for  estimation  of  the  Cox  model  parameters  and  for 
description  of  the  bootstrap  confidence  intervals.  Section  3  describes  the  set-up  for  the 
Monte  Carlo  studies  and  reports  results  of  the  studies.  Section  4  gives  a  summary  and 
states  conclusions.  The  theoretical  result  is  stated  and  proved  in  Section  5.  The  figures 
and  chart  for  Section  3  are  contained  in  the  appendix. 

2  Notation  and  Algorithms 

2.1  Estimation  of  the  Cox  Model  Parameters 

Here  we  describe  the  estimators  of  /3o,  5(^11),  and  ^p(x),  and  the  estimators  of  the  standard 
errors  of  these  estimators.  Several  estimators  are  possible  for  each  of  these  parameters. 
The  particular  estimator  used  has  an  impact  on  the  bootstrap  procedure,  particularly 
because  of  the  many  ties  in  the  bootstrap  samples;  therefore  it  is  necessary  to  include 
many  details  in  this  description,  which  will  be  given  in  counting  process  notation,  following 
closely  that  of  Andersen  and  Gill  (1982)  (henceforth  AG). 

In  the  counting  process  formulation  of  the  likelihood,  we  use  X,,  the  g-dimensional 
vector  of  covariates,  but  rather  than  using  Ti  and  6,  directly  we  instead  define  the  counting 
processes 


Ni(t)  —  I{Ti  <  t,  Si  =  1)  for  t  >  0 


and  the  processes 


J,(<)  =  I[Ti  >  t)  for  t  >0. 


In  this  notation,  conditional  on  A',  =  Xi, 
T  is 

n  nf 

ue[o,T]<=i  \ 


i  =  1, . . . ,  n,  the  partial  likelihood  of  at  time 

J.(u)exp(;g'x.) 

E"=i«/j(«)exp(/?'x,)  j 


The  maximum  partial  likelihood  estimator  of  /?o  at  time  t  is  the  value  $  =  ${t)  of  ^  that 
maximizes  L{I3,t).  In  practice,  of  course,  one  uses  the  value  of  0  that  maximizes  the 
partial  likelihood  at  time  oo;  see  the  discussion  in  Section  4  of  AG.  In  the  case  of  ties,  we 
use  the  usual  Peto  approximation  (Peto  (1972)).  Other  solutions  are  possible,  but  they 
are  more  computationally  intensive;  we  considered  such  solutions  impractical  because  the 
number  of  ties  in  the  bootstrap  samples  can  be  quite  large. 

Next  we  must  specify  the  estimators  of  5(t|x)  and  ^p(x),  and  for  this  it  is  first  necessary 
to  give  an  estimator  of  A.  The  “Nelson-Aalen”  estimator  of  A  is 


A(<)=  /  E«^j(5)exp(^'xj))  ^d{^Ni{s)). 
]=l  i=l 


The  estimator  A(<)  increeises  only  by  jumps,  which  occur  at  the  uncensored  deaths.  Bres- 
low’s  (1972,  1974)  estimator  of  A(t)  is  the  continuous  estimator  obtained  from  the  Nelson- 
Aalen  estimator  by  linear  int  erpolation  between  observed  failure  times.  There  are  two  gen¬ 
eral  approaches  to  estimation  of  5(<|x)  available  in  practice.  The  Tsiatis/Link/Breslow 
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approach  takes  an  estimator  of  the  form 

5(^la;)  =  exp(— A{t|ar)), 

for  some  estimator  A  of  A.  Tsiatis  (1983)  uses  this  form  for  estimation  of  S{t)  with  the 
Nelson-Aalen  estimator  of  A(<);  Link  (1984)  takes  the  same  form  but  uses  the  Breslow 
estimator  of  A{t).  We  note  that  the  relationship 

S{t)  =  exp(— A(t)) 

is  only  valid  for  continuous  T.  The  other  approach  is  to  use  the  product  integral; 
Kalbfieisch  and  Prentice  (1980,  p.  86)  provide  a  nonparametric  MLE  for  5(t|x)  using 
this  approach.  We  use  the  product  integral,  which  is  described  as  follows.  Note  that  for 
an  arbitrary  cumulative  hazard  function  ff  (which  may  contain  discrete  or  continuous 
components  or  both),  the  survival  function  corresponding  to  H  is  the  product  integral 

s())=n(i-w*))-  (2>) 

s<t 

See  Gill  and  Johansen  (1990)  or  Kalbfieisch  and  Prentice  (1980,  sec.  1.2.3).  Then,  given 
an  estimator  H{t)  of  i/,  the  product-integral  estimator  of  S{t)  is 

S(0  =  n(l-«W)-  (2-2) 

3<t 

To  specify  the  estimator  of  S{t)  it  remains  to  give  our  estimator  of  the  cumulative  hazard 
function.  We  use  the  Nelson-Aalen  estimator  of  A(f),  which  is  a  step-function  estimator. 
(We  later  do  a  linear  smooth  of  S'(t|x);  see  Remark  3  below.)  In  this  paper  we  study  the 
baseline  survival  function;  however,  we  also  need  to  decide  on  an  estimator  of  ^(f  |x)  for  the 
purpose  of  defining  an  estimator  of  (p{x).  Taking  care  to  specify  the  model  appropriately 
for  an  arbitrary  distribution,  e.g.  such  as  a  discrete  bootstrap  distribution,  we  use  the 
following  relationship  between  A(fjx)  and  A(t) 

1  -  A(d<|x)  =  (1  -  A(dOr'’‘^""^-  (2-3) 

See  Kalbfieisch  and  Prentice  (1980,  sec.  2.4.2  &  4.6.1).  The  above  relationship  combined 

with  2.2  yields  the  following  estimator  of  ^(ilx) 

5((w=n(i (2‘i) 

A  more  detailed  explanation  for  this  estimator  appears  in  Burr  and  Doss  (1991). 
Remarks  on  Computational  Details 

1  As  is  commonly  done  in  fitting  the  Cox  model,  we  assume  covariates  have  been  centered 
at  their  mean;  so,  we  take  the  baseline  hazard  function  and  baseline  survival  function 
to  be  at  the  mean  covariate  values.  That  is,  to  be  fully  precise,  in  Equation  (2.4),  the 
Nelson-Aalen  estimate  A(/)  of  A(/)  is  computed  at  mean-centered  covariates,  and  the 
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covariate  vector  x  in  the  exponent  must  be  mean-centered.  To  keep  notation  as  simple 
as  possible,  from  now  on  we  assume  covariates  have  already  been  corrected  for  the 
mean.  In  this  paper  we  study  the  estimator  of  the  baseline  survival  function,  5(t|x), 
which  we  refer  to  from  now  on  as  S{t). 

2  If  any  factor  (1  —  A(ds))  is  less  than  zero,  then  is  taken  to  be  zero. 

3  The  function  5'(t|x)  is  a  step  function,  with  jumps  at  the  observed  uncensored  survival 

times.  We  actually  use  here  a  smoothed  version  of  5(<|z),  which  may  be  de¬ 

scribed  as  follows:  Say  that  the  number  of  unique  uncensored  survival  times,  plus  the 
last  censored  observation  if  it  is  the  last  observation  overall,  is  tIu,  with  ordered  values 
denoted  Also,  set  T^o)  =  0-  For  0  <  <  <  T’„(i)/2,  5‘^(<|x)  =  1.  Let 

Wu(q  =  S{iu(i-i)\x)  —  be  the  weight  assigned  to  the  uncensored  observation 

by  5(tlx)  as  given  in  Equation  (2.4)  and  Remarks  1  and  2  above.  Then  “smear”  half  of 
Wu(i)  left,  the  other  half  right,  where  the  smearing  is  done  by  linear  interpolation  from 
Tu(i)  to  halfway  between  r„(,)  and  the  adjacent  uncensored  observation.  If  =  0  the 
estimator  is  undefined;  these  cases  were  skipped  in  the  bootstrap  resampling.  Beyond 
the  last  observation  the  linear  extrapolation  uses  the  slope  of  the  previous  seg¬ 
ment,  until  the  point  to  is  reached  for  which  =  0;  then  for  i  >  to,  =  0. 

Note  that  as  a  result  of  this  definition,  the  survival  function  does  eventually  decrease 
to  zero,  even  if  the  last  observation  is  censored. 

To  estimate  ^p(x)  we  note  that  the  quantile  of  1  —  5  is  (1  —  5)"*(p).  Here  and 
throughout  the  paper,  for  an  arbitrary  increasing  function  /,  f~^  denotes  the  right  con¬ 
tinuous  inverse  of  /  defined  by  =  sup{s  :  f{s)  <  t}.  For  the  case  of  the  survival 

function  given  by  Equations  (2.1)  and  (2.3),  this  gives 

^p(x)  =  sup{s  :  1  -  11(1  -  A(du))”P<^o^)  <  p}.  (2.5) 

t4<5 

Substituting  A  for  A  and  0  for  /3o,  we  obtain  the  estimate 

4(x)  =  sup{s  :  1  -  11(1  -  A(du))'’'P<^'")  <  p}.  (2.6) 

U<3 


In  fact,  we  use  a  continuous  version  ^p(x)  of  ^p(x)  based  on  the  continuous  version  of 
5'(t|x)  described  in  Remark  3  above.  That  is,  ^p(x)  can  be  obtained  by  solving  for  t  in  the 
equation  5‘^(t|x)  =  1  —  p.  From  now  on,  for  the  sake  of  brevity  of  notation,  the  superscript 
c  is  omitted  in  .^“^(tlx)  and  ^p(x). 

Further  notation  is  needed  in  order  to  give  the  formulas  for  the  standard  errors 
of  S{t),  and  ^p(x).  The  notation  below  follows  closely  that  of  AG.  For  a  g-vector 
w  =  {wi, . . .  ,Wg),  denotes  the  q  x  q  matrix  whose  (r,J')‘*'  entry  is  WiWj.  Define 
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Ei^J) 


”  1^1 

-y'xiJi{t)exp{l3'xi), 

-  ^  x^^Jt{t)  exp{/3'xi), 
”  /=! 

s^^K0J) 

5(«)(^,0’ 


(2.7) 


and 

It  is  shown  in  AG  that  a  consistent  estimator  of  the  asymptotic  covariance  matrix  of 
\/n0  -  /So)  is  given  by 


2^4  =  {£  V0,t)S‘-°>0,t)dK{t)y' .  (2.8) 

Next  we  discuss  estimation  of  the  standard  error  of  S{t).  The  results  of  AG  may  be 
used  to  easily  show  that  the  asymptotic  variance  of  v^(A(<)  —  A(<))  may  be  consistently 
estimated  by 

+  (2.9) 

where 

a{t)  =  /  {S^°\0,u))~^dk{u), 

Jo 

6(0  =  f  E0,u)dk{u).  (2.10) 

Jo 

Since  5(0  =  na<((l  ~  k{ds)),  we  can  apply  a  functional  version  of  the  6-method  to 
obtain  the  asymptotic  variance  of  y/n{S{t)  —  S{t)).  See  Gill  and  Johansen  (1990).  On  a 
less  technical  level,  we  can  write  5(0  =  exp(— A(0)  and  apply  the  standard  6-method. 
The  two  answers  obtained  are  identical,  and  the  variance  so  obtained  can  be  estimated 
consistently  by 

(2-11) 

Dabrowska  and  Doksum  (1987)  and  Burr  and  Doss  (1991)  derive  the  asymptotic  variance 
of  ^p(a:),  which  depends  upon  the  baseline  hazard  rate  A(0.  To  describe  our  estimator  of 
the  asymptotic  variance  of  ^p(x),  we  must  define  our  estimator  of  the  baseline  hazard  rate 
A(0.  Possibilities  include  methods  based  on  splines  (Whittemore  and  Keller  (1986))  and 
those  based  on  kernel  smoothers.  Kernel  smoothers  are  computationally  convenient,  and 
in  addition  their  asymptotic  properties  in  the  present  context  have  already  been  studied 
by  Ramlau-Hansen  (1983).  To  describe  them,  let  il  be  a  function  of  bounded  variation 
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with  support  on  [—1,1],  and  whose  integral  is  1,  and  let  {6„}  be  a  sequence  of  positive 
constants  such  that  as  n  — >  oo,  we  have  — +  0  and  nb^  — oo.  Define  the  kernel  estimate 
of  A(-)  by 

^  1  ,f  —  Cv 

X{t)  =  yJ^  /2(-^)dA(s).  (2.12) 

The  specific  choices  of  R  and  {i„)  are  discussed,  in  the  context  of  ordinary  density 
estimation,  in  Silverman  (1986,  pp.  40-72).  Details  of  our  algorithms  are  provided  in 
the  Remarks  below.  Having  specified  a  choice  of  R  and  {&„}  we  may  define  an  estimate 
of  (j^(v^^p(x)).  Using  the  notation  ir  =  log(l  —  p),  we  have  the  estimate 

^2  ^  Q(6(^))  ,  /M4(^))  +  7rxexp(-^'j)\^_i/6(4(a:))  +  7ra;exp(-^^x)x 

(A(c(i))f  ^  kU^))  ’  '  kiM)  '' 

(2.13) 


Remarks  on  the  Kernel  Estimator  of  the  Hazard  Rate 

1  In  this  paper  we  use  the  biweight  kernel 

.R(0  =  ]|(1  -  I<I<1  (2.14) 

2  In  choosing  the  window  widths  6„  we  have  followed  Ramlau-Hansen  (1983)  in  allowing 
varying  window  widths  over  the  range  of  survival  times.  These  would  generally  be 
increasing  as  t  increases  to  cope  with  the  scarcity  of  data  for  larger  t.  In  particular,  we 
allowed  four  different  intervals  of  the  time  axis  with  different  window  widths  in  each. 
The  window  widths  were  taken  to  be  inversely  proportional  to  and  the  choice  of 
constant  of  proportionality  was  made  subjectively,  using  one  or  two  simulated  data  sets 
for  each  simulation  situation. 

2.2  Description  of  Confidence  Intervals 

Four  methods  are  used  to  form  approximate  confidence  intervals  for  the  parameters  /3o, 
S{t),  and  ^p{x).  Here  we  first  give  brief  descriptions  of  these  methods,  followed  by  further 
discussion  of  the  bootstrap-^.  We  also  describe  a  method  of  refining  each  of  the  types  of 
bootstrap  intervals,  called  the  iterated  bootstrap,  which  we  do  not  study  in  this  work, 
but  which  we  refer  to  in  discussions  of  our  results. 

Denote  the  parameter  being  estimated  by  6,  its  estimator  by  9  and  the  estimate  of 
standard  error  of  6  by  a.  In  forming  the  confidence  intervals  described  below,  we  have 
used  the  estimators  6  and  d  defined  in  Section  2.1.  The  nominal  approximate  coverage 
probability  of  the  intervals  is  100(1  —  2a).  The  (1  —  0)^**  quantile  of  the  standard  normal 
distribution  is  denoted  The  standard  method,  based  on  asymptotic  theory  for  the 
Cox  model,  gives  the  interval 

(0±<T2<">).  (2.15) 

Three  bootstrap  methods  are  studied.  We  refer  to  an  estimate  computed  from  a  bootstrap 
sample  as  6",  and  we  let  K  denote  the  bootstrap  distribution  of  6'.  The  percentile  interval 
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is 

{K-\a),K--^{\-a)).  (2.16) 

In  the  hybrid  method,  the  bootstrap  distribution  of  0*  —  6  is  used  to  get  approximate 
quantiles  of  the  distribution  oi  0  —  0.  This  leads  to  the  interval 

[26  -  K-^{^  -  a),  2(9  -  K~\a)).  (2.17) 

See  Efron  (1990,  p.  14).  For  the  bootstrap-t,  consider  the  bootstrap  distribution  of 
T*  =  [0*  ~0)/&*,  where  d’  is  the  estimate  a  computed  from  a  bootstrap  sample.  Call 
this  bootstrap  distribution  Kt-  The  boo^strap-t  interval  is 

[0  -  Kt-\i  -  a)a,0  -  Kt-\Q)a).  (2.18) 

In  the  Monte  Carlo  studies  reported  in  Section  3,  the  four  types  of  confidence  intervals 
described  above  are  studied  for  all  the  parameters  considered,  except  that  bootstrap-< 
intervals  are  not  studied  for  ^i/2(  )-  Because  estimates  of  the  standard  error  of  ^i/2(-)  are 
inherently  unstable,  especially  in  situations  with  high  censoring,  the  bootstrap-t  using 
the  asymptotic  formula  for  a  did  very  poorly.  One  could  attempt  use  of  the  bootstrap 
estimate  of  standard  deviation  of  ^p(x)  in  the  denominator  of  the  bootstrap-t.  That  is,  in 
Equation  2.18  above,  use  the  bootstrap  estimate  of  a  rather  than  the  asymptotic  formula 
fo.  d.  Note  that  then,  in  order  to  compute  the  bootstrap  statistic  T*,  the  denominator 
d*  is  obtained  through  a  second  layer  of  bootstrapping,  and  the  amount  of  time  that  this 
takes  on  a  single  sample  is  so  large  that  Monte  Carlo  studies  of  this  method  are  impossible 
on  workstations  commonly  available  today.  Also,  even  with  the  bootstrap  estimate  of 
standard  deviation,  the  bootstrap-t  may  do  poorly.  Doss  and  Gill  (1991),  in  an  example 
where  they  are  estimating  the  quantiles  of  the  survival  function  iii  the  random  censorship 
model  of  survival  analysis,  had  difficulties  with  unstable  bootstrap-t  intervals  when  they 
attempted  to  use  the  bootstrap  estimate  of  standard  deviation.  On  a  theoretical  level,  the 
performance  of  bootstrap  confidence  intervals  for  quantiles  takes  on  a  different  character 
from  their  performance  for  quantities  such  as  the  mean:  The  difference  between  nominal 
and  actual  coverage  is  0(n“*^^)  not  0(n~^);  that  is,  these  intervals  are  only  first-order 
accurate,  not  second-order.  See  the  related  work  of  Hall  and  Martin  (1988).  We  make  one 
final  comment  here,  in  which  is  indicated  some  hope  for  the  bootstrap-t  in  this  situation: 
Doss  and  Gill  (1991)  try  other  measures  of  scale  for  the  denominator  of  the  bootstrap-f 
in  their  example,  and  they  settle  on  a  particular  interquantile  range  which  led  to  stable 
bootstrap-t  intervals.  However,  we  have  not  attempted  to  do  this  here. 

In  this  context,  it  is  important  to  mention  some  recent  research  on  the  iterated  boot¬ 
strap  method  of  refining  confidence  intervals  which  was  discussed  by  Beran  (1987).  De¬ 
scriptions  of  the  use  of  the  method  for  coverage  correction  of  confidence  intervals  may 
be  found  in  Martin  (1990)  and  DiCiccio,  Martin  and  Young  (1990).  Here  we  give  a  brief 
description  of  the  idea  behind  this  method.  Suppose  we  want  to  form  a  90%  percentile 
confidence  interval.  The  idea  is  to  estimate  the  actual  coverage  probability  of  the  per¬ 
centile  intervals  for  several  nominal  levels,  and  then  use  the  percentile  interval  which  has 
estimated  coverage  probability  exactly  equal  to  the  desired  level  of  90%.  The  estimation 
of  coverage  probabilities  is  done  through  a  second  layer  of  bootstrapping;  that  is,  from 
each  bootstrap  sample  used  in  forming  the  original  confidence  interval,  many  bootstrap 
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samples  are  drawn  and  the  percentile  interval  formed.  So,  this  method  is  extremely  com¬ 
putationally  intensive.  The  recent  work  of  DiCiccio,  Martin  and  Young  (1990)  is  on  an 
analytic  method  to  replace  the  Monte  Carlo  simulation  in  the  inner  layer  of  bootstrap 
sampling,  for  the  problem  of  constructing  confidence  intervals  for  a  parameter  9  that  is 
expressible  as  a  smooth  function  of  vector  means.  Their  work  does  not  apply  to  boot¬ 
strapping  in  the  Cox  model.  We  did  not  attempt  the  iterated  bootstrap  in  our  work,  but 
wo  refer  to  it  in  discussion  of  the  results  of  the  Monte  Carlo  studies. 


3  The  Monte  Carlo  Studies 

First,  we  mention  an  important  aspect  of  our  discussion  of  the  Monte  Carlo  results.  We 
compare  the  various  methods  in  terms  of  coverage  probability  and  average  or  median 
length.  A  method  with  higher  coverage  probability  and  smaller  average  length  than 
a  second  method  is  certainly  better  than  the  second  method.  However,  we  often  find 
that  the  method  with  higher  coverage  probability  also  has  larger  average  length,  and  in 
this  case  it  may  be  that  the  iterated  bootstrap  would  produce  a  good  interval.  Or,  one 
could  devise  a  way  to  compare  the  methods  by  adjusting  all  lengths  to  be  equal  and 
then  comparing  coverage  probabilities,  as  in  Owen  (1988).  We  have  not  done  this  here; 
rather,  we  discuss  the  results  simply  through  direct  comparisons  of  coverage  probability 
and  average  length,  which  generally  leads  us  to  call  some  method  “best”  for  a  particular 
situation  if  its  coverage  probability  is  the  closest  to  the  nominal  level  and  its  length  is 
not  exorbitant  relative  to  the  other  methods.  In  our  choices  of  the  “winners,”  we  admit 
to  some  conservative  prejudice  in  favor  of  accuracy  of  coverage  probability  over  shortness 
in  length. 

In  Section  3.1,  we  describe  in  a  general  way  how  we  simulate  the  data  sets  conditionally 
on  the  covariate/censoring  pattern.  In  Section  3.2,  we  list  the  factors  determining  the 
simulations  and  state  the  levels  of  these  factors  which  were  included  in  the  main  computer 
experiment  which  we  report  on  here.  In  Section  3.3,  we  summarize  the  results  of  this  main 
study.  Finally,  in  Section  3.4,  we  describe  the  study  with  an  influential  point  and  state 
briefly  the  main  results  from  it. 

3.1  General  Description  of  a  Simulated  Dataset 

Assume  we  have  decided  upon  the  sample  size  n,  the  regression  parameter  /3o,  the  co¬ 
variate  distribution  Fx,  the  lifetime  distribution  F,  and  the  censoring  distribution  G. 
The  data  sets  in  our  simulation  studies  were  generated  to  be  compatible  with  a  fixed 
covariate/censoring  pattern  according  to  the  following  steps. 

1  Get  one  set  of  data  (A'",, T,, <5,),  i  =  l,...,n  as  follows:  First,  generate  the  covariate 

values  Xi  ~  Fx,  the  survival  times  Yi  ~  F(t(A'i)  =  1  —  (1  —  and  the 

censoring  times  C,  ~  G;  next,  form  the  data  points  (X,,  T,,  6,)  where  T,  =  min(Vi,  C,) 
and  6i  =  I{Yi  <  C.). 

2  Generate  one  data  set  compatible  with  the  covariate/censoring  pattern  {X,C)  obtained 
in  Step  1,  as  follows.  For  i  =  1, . . .  ,n. 
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a.  Let  X-  =  Xi; 

b.  Generate  V/  as  in  Step  1. 

c.  Generate  C-  conditionally  on  C,  as  follows.  If  (5,  =  0,  then  C'  =  T^.  If  (5,  =  1,  then 
generate  Cj  from  the  conditional  distribution  of  C  given  that  C  >  T,.  For  example, 
if  G  is  the  Exponential  1)  distribution,  then  using  the  memoryless  property  of  the 
exponential  distribution,  we  let  C,  =  7i  +  rexp(),  where  rexp()  denotes  a  random 
draw  from  the  Exponential(l)  distribution. 

d.  Form  T'  =  min(y/,C'),  S'  =  /(V^'  <  C'). 

3  Repeat  Step  2  many  times.  (Each  time  get  confidence  intervals  for  the  parameters  of 
interest  by  the  several  methods  and  record  relevant  information  such  as  whether  the 
intervals  contain  the  true  value,  and  the  length  of  the  intervals.) 

3.2  Factor  Levels  Included  in  the  Experiment 

From  the  above  description  of  a  simulated  data  set,  it  is  clear  that  the  factors  affecting  a 
simulation  are; 

1  Fx,  the  covariate  distribution 

2  F,  the  lifetime  distribution 

3  The  Cox  regression  parameter  /3o 

4  The  form  of  the  censoring  distribution  G 

5  The  average  percent  censoring 

6  The  sample  size  n 

7  The  particular  covariate  pattern 

8  The  particular  censoring  pattern 

In  the  study  we  describe  here,  we  use  the  Uniform(0,l)  distribution  for  the  covariater. 
Also,  we  use  the  standard  Exponential  distribution  for  F,  we  take  =  2,  and  we  use 
the  Uniform  distribution  for  G.  We  take  the  mean  of  the  censoring  distribution  to  be 
.25  for  approximately  55%  average  amount  of  censoring.  The  sample  sizes  considered  are 
n  =  30,40,50,60,70,80,90,100.  For  the  smaller  sample  sizes  of  n  =  30,  n  =  40,  and 
n  =  50,  there  are  three  distinct  covariate/censoring  patterns;  for  the  larger  sample  sizes 
of  77  =  60  and  up,  there  are  two  distinct  covariate/censoring  patterns. 

We  report  90%  confidence  intervals  for  the  parameters  5(.106),  S(.255),  ^i/2(-5), 
and  ifi/2(-939).  The  values  of  t  at  which  the  function  5(-)  is  studied,  =  .106  and 
<2  =  .255,  are  the  .25  and  .50  quantiles  of  the  distribution  of  5(<|x  =  .5).  The  values  of  x 
at  which  the  function  G/2(')  is  studied,  ii  =  .5  and  X2  =  .939,  are  such  that  ^1/2(3^!)  =  h 
and  G/2(3:2)  =  t\-  The  four  types  of  confidence  intervals  are  studied  for  all  the  parameters, 
except  that  bootstrap-f  intervals  are  not  attempted  for  Ci/2(')-  The  bootstrap  intervals 
are  formed  from  bootstrap  samples  taken  by  each  of  the  three  methods  of  forming  the 
bootstrap  sample.  Therefore  ten  methods  are  studied  for  0o  and  5(-),  and  seven  methods 
are  studied  for  ifi/2(’)-  The  number  of  simulations  in  all  the  studies  is  2000. 
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3.3  Summary  of  Results  of  the  Monte  Carlo  Studies 

The  performance  of  90%  confidence  intervals  for  each  of  the  five  parameters  listed  above  is 
studied  in  terms  of  coverage  probability  and  average  length.  For  the  confidence  intervals 
for  ^i/2(a;)»  we  report  median  rather  than  mean  length;  this  is  explained  when  those  results 
axe  discussed.  The  results  of  the  studies  are  summarized  by  plots  of  coverage  probability 
versus  sample  size  and  plots  of  average  (or  median)  length  versus  sample  size.  In  studying 
the  results,  we  found  that  plotting  the  mass  of  numbers  enabled  us  to  make  much  more 
sense  of  them  than  simply  scanning  them  in  a  table.  The  plots  are  grouped  according 
to  the  type  of  parameter  being  estimated  (y0o»  ‘S'(-),  or  arranged  in  nine 

figures.  The  plots  and  an  explanation  of  them  are  in  the  Appendix. 

Consider  /3o  first,  with  results  plotted  in  Figures  1  and  2.  For  this  parameter,  we 
note  a  strong  interaction  effect  between  the  two  factors  which  determine  the  confidence 
intervals,  the  method  of  drawing  the  sample  and  the  method  of  forming  the  interval  from 
a  given  sample.  For  example,  the  variability  among  the  coverage  probabilities  of  the 
three  methods  of  drawing  the  samples  is  much  less  for  the  percentile  intervals  than  for 
the  other  two  types  of  intervals.  Also,  Method  1  of  drawing  the  sample  produces  the 
best  bootstrap-f  intervals,  but  the  worst  percentile  intervals.  Now  we  give  the  practical 
implications  of  the  results  for  ^q.  Overall,  the  asymptotic  method  seems  to  do  the  best  in 
this  study.  It  has  close  to  or  slightly  above  the  nominal  coverage  probability  throughout, 
yet  its  average  length  is  at  worst  only  very  slightly  greater  than  the  average  length  of  any 
of  the  nine  bootstrap  methods.  It  has  slightly  larger  average  length  than  the  bootstrap-/ 
intervals  (henceforth  referred  to  as  boot-/  intervals)  but  the  boot-t  intervals  all  have  low 
coverage  probability.  Among  the  bootstrap  methods,  a  close  competitor  to  the  asymp¬ 
totic  method  is  the  hybrid  interval  from  Method  2  samples  (henceforth  referred  to  as 
hybrid/Method  2).  These  intervals  give  substantially  greater  coverage  probability  than 
the  asymptotic  intervals  (and  greater  than  the  nominal  level),  at  the  cost  of  greater  aver¬ 
age  length  than  the  asymptotic  intervals.  The  difference  in  average  length  is  substantial 
only  for  the  sample  sizes  up  to  n  =  50.  We  mention  one  other  important  result  about  the 
bootstrap  methods  here.  This  is  that  the  percentile/Method  1  intervals  are  bad:  They 
have  larger  average  length  than  the  other  methods  for  sample  sizes  up  to  n  =  60,  yet  no 
better,  or  worse  coverage  probability.  The  hybrid/Method  1  intervals,  which  are  the  same 
length  as  the  percentile/Method  1  intervals,  have  very  high  coverage  probability.  So  for 
estimation  of  ^q,  when  taking  the  bootstrap  sample  by  the  simplest  method,  the  most 
obvious  methods  of  forming  bootstrap  intervals  are  bad  because  they  are  too  long.  In  the 
case  of  the  percentile  intervals,  which  are  the  most  often  used,  they  are  not  only  too  long, 
but  they  get  the  “wrong”  part  of  the  bootstrap  distribution  and  thus  have  low  coverage 
probability  as  well. 

Figures  3-6  summarize  the  results  for  estimation  of  5(-),  for  two  values  of  /.  We  can 
make  a  general  statement  here  regarding  the  methods  of  forming  the  intervals:  Roughly 
one  could  say  that  the  coverage  probabilities  of  the  boot-/  intervals  are  too  high,  the  cov¬ 
erage  probabilities  of  the  hybrid  intervals  are  too  low,  and  the  coverage  probabilities  of  the 
percentile  intervals  are  closest  to  nominal.  Next  we  give  the  practical  implications  for  this 
parameter.  The  asymptotic  intervals  have  unacceptably  low  coverage  probability  through¬ 
out;  they  lose.  (This  fact  was  noted  by  Link  (1984),  who  tried  several  transformations  to 
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improve  upon  the  standard  asymptotic  method.)  The  winner  is  the  percentile/Method  1 
interval,  with  the  percentile/Method  2  interval  a  close  second. 

In  the  case  of  estimation  of  ^(•),  all  of  the  methods  of  forming  confidence  intervals 
produced  occasional  very  long  intervals;  the  distribution  of  the  length  of  confidence  in¬ 
tervals  was  strongly  positively  skewed.  We  noted  that  for  the  three  methods  of  taking 
bootstrap  samples,  the  most  erratic  fluctuation  of  length  and  largest  mean  length  was 
with  Method  3,  then  Method  2.  Method  1  was  the  most  stable.  The  median  length 
provides  a  more  relevant  comparison  of  the  three  methods. 

Figures  7-9  show  the  results  for  estimation  of  ^(•),  for  two  values  of  x.  Rather  than 
average  length  of  the  intervals,  the  median  length  is  plotted,  in  Figure  9.  The  most 
striking  thing  about  the  plots  is  that  the  hybrid  intervals  have  extremely  low  coverage 
probability.  The  asymptotic  intervals  have  somewhat  low  coverage  probability  for  ^(xi), 
for  the  sample  sizes  of  n  =  60  and  up,  and  they  have  high  coverage  probability  for 
^(x2).  Regarding  the  performance  of  the  three  methods  of  drawing  bootstrap  samples 
in  conjunction  with  percentile  intervals.  Method  2  is  best  for  estimation  of  f(xi);  it  has 
close  to  the  nominal  coverage  probability  and  the  shortest  median  length  of  the  three 
methods.  Method  3  is  actually  the  worst  of  the  three,  at  least  for  sample  sizes  up  to 
about  n  =  80.  In  the  case  of  estimation  of  {(x2)5  Method  3  is  not  such  a  clear  loser;  it 
has  low  coverage  probability  but  also  has  markedly  shorter  median  length  than  the  other 
methods.  Methods  1  and  2  are  very  close  competitors  here. 

3.4  The  Study  With  An  Influential  Point 

This  study  consisted  of  only  one  situation,  as  contrasted  with  nineteen  situations  included 
in  the  main  study.  Therefore  it  is  reported  here  simply  to  give  an  indication  of  the 
differences  that  arise  when  an  influential  point  is  included. 

The  data  sets  in  this  simulation  study  were  generated  as  described  in  Section  3.1, 
conditional  on  a  covariate/censoring  pattern.  The  data  set  which  was  conditioned  on  weis 
handpicked  to  include  an  influential  point,  and  the  lifetime  and  censoring  distributions 
were  constructed  so  that  this  handpicked  data  set  was  a  possible  occurrence  in  the  model. 
This  Wets  done  through  the  use  of  mixture  distributions  for  the  lifetime  Y  and  the  censoring 
time  C . 

To  make  this  clear,  it  is  perhaps  best  to  spell  out  the  details  for  the  particular  situation 
considered.  The  sample  size  was  taken  to  be  n  =  30,  and  the  regression  parameter  /?o 
was  taken  equal  to  2.  Then,  Xi,  the  first  covariate  value,  w«is  chosen  to  be  unusually 
large.  That  is,  twenty-nine  out  of  the  thirty  covariate  values  were  generated  initially  from 
the  Uniform(0,l)  distribution,  whereas  Xi  was  taken  equal  to  1.5.  Since  the  regression 
parameter  is  positive,  lifetimes  associated  with  X  =  1.5  would  tend  to  be  much  smaller 
than  the  other  lifetimes.  An  influential  point  would  be  one  with  large  X  but  large  lifetime. 

The  lifetime  distribution  F  was  taken  to  be  a  mixture  of  Exponential  distributions, 
one  with  mean  50  occurring  with  probability  .04,  the  other  with  mean  1  occurring  with 
probability  .96.  For  the  unusual  lifetime  distribution  with  mean  50,  if  A'  =  1.5  the  mean 
lifelength  is  2.49. 

The  influence  of  such  a  point  is  likely  to  be  lost  by  censoring,  so  a  mixture  distribution 
on  C  was  used  to  allow  the  possibility  that  such  a  point  would  remain  uncensored.  The 
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censoring  variable  was  taken  equal  to  3  with  probability  .04  and  otherwise  Exponential 
with  mean  1.0,  so  that  average  percent  censoring  was  about  25%. 

In  getting  the  initial  data  set  as  in  Step  1  of  Section  3.1,  the  lifetime  associated  with 
the  large  covariate  value  =  1.5  was  set  equal  to  the  median  of  the  unusual  lifetime 
distribution  (mean  =  50)  for  that  value  of  X,  and  it  was  forced  to  be  uncensored. 

The  performance  of  90%  confidence  intervals  is  summarized  in  Table  1  in  the  Appendix. 
One  observation  is  that  the  asymptotic  method  is  no  longer  best  for  estimation  of 
both  Method  2  and  Method  3  percentile  intervals  beat  it  uniformly,  that  is,  with  higher 
coverage  probability  yet  shorter  average  length.  We  also  see  that  Method  3  of  drawing 
the  bootstrap  samples  does  much  better  than  it  did  in  the  other  studies.  We  note  that 
no  one  method  is  uniformly  better  than  the  others  for  all  the  parameters,  and  Methods  2 
and  3  of  drawing  the  samples  are  close  throughout. 


4  Discussion 

Our  first  comment  is  that  although  bootstrap  methods  do  not  improve  upon  the  asymp¬ 
totic  method  in  the  case  of  /3o,  they  do  compete  closely  with  the  asymptotic  method  for 
Po  and  offer  substantial  improvement  for  5(-)  and  ^i/2(')-  For  the  practitioner  who  wants 
to  use  the  bootstrap  in  the  Cox  model,  results  of  the  Monte  Carlo  studies  suggest  which 
bootstrap  methods  would  be  preferred  among  those  now  readily  available.  We  would  like 
to  be  able  to  recommend  a  single  method  appropriate  for  all  pau'ameters,  but  as  mentioned 
earlier,  this  is  not  possible.  The  performance  of  the  methods  was  markedly  different  from 
parameter  to  parameter.  A  summary  of  conclusions  drawn  from  the  Monte  Carlo  studies 
is  as  follows.  For  ^o,  we  recommend  the  hybrid/Method  2  intervals.  For  5(-)  and  ^i/2(’)» 
we  recommend  the  percentile/Method  2  intervals.  (Method  1  did  slightly  better  than 
Method  2  for  5(-),  but  its  advantage  W2is  so  slight  that  it  hardly  seems  worth  the  extra 
effort  of  using  a  different  method  of  resampling.) 

Regarding  the  methods  of  drawing  the  bootstrap  samples,  overall  this  factor  appears 
to  be  less  important  than  the  method  of  forming  the  intervals  in  its  effect  on  performance 
of  confidence  intervals.  However,  some  effects  of  this  factor  were  noted.  First,  the  most 
sophisticated  way  of  drawing  the  samples.  Method  3,  gives  much  more  erratic  results  than 
the  two  simpler  methods,  and  we  would  prefer  to  avoid  this  method.  Also,  Method  1  was 
unreliable  for  estimation  of  ^o,  and  therefore  we  would  prefer  to  avoid  this  simplest  method 
of  resampling  as  well. 

An  important  lesson  learned  from  this  work  is  that  the  choice  of  method  of  forming 
the  interval  is  not  one  to  be  made  lightly.  There  are  big  differences  among  the  methods  we 
studied  in  terms  of  coverage  probabilities.  In  particular,  the  bootstrap-t  was  very  unstable 
throughout,  and  the  hybrid  method  had  very  low  coverage  probability  for  ^i/2(3;)-  Our 
results  contrast  sharply  with  those  of  Owen  (1988),  who  included  five  bootstrap  methods 
in  a  large-scale  Monte  Carlo  study  of  confidence  intervals  for  the  mean  from  a  random 
sample,  and  found  the  bootstrap-t  to  be  a  clear  winner  throughout. 

However,  it  is  important  to  realize  that  the  present  work  does  not  provide  the  final 
word  on  bootstrap  confidence  intervals  in  the  Cox  model.  Use  of  the  iterated  bootstrap 
refinement  seems  especially  appropriate  for  the  bootstrap-t  intervals  for  0o  and  5(  ).  These 
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intervals  were  erratic  in  length  and  coverage  probability  in  that  they  were  either  very 
short  with  low  coverage  probability  or  very  long  with  high  coverage  probability.  However, 
they  never  lost  out  to  another  method  by  having  both  lower  coverage  probability  and 
greater  length;  ajid  so,  this  leads  us  to  believe  that  the  bootstrap-t  interval,  corrected 
by  the  iterated  bootstrap  method,  could  prove  to  be  useful.  The  same  comment  applies 
to  the  percentile/Method  3  intervals,  except  for  estimation  of  4i/2(^i)?  indicating  that 
Method  3  of  sampling  might  prove  useful  if  further  refined,  particularly  in  light  of  its 
good  performance  in  the  study  with  the  influential  point.  Monte  Carlo  studies  of  the 
performance  of  the  iterated  method  will  only  be  feasible  if  one  can  develop  analytic 
techniques  to  replace  the  inner  layer  of  resampling  in  this  method,  as  is  done  in  DiCiccio, 
Martin,  and  Young  (1990)  for  a  simpler  problem. 


5  Asymptotic  Validity  of  Method  3  of  Bootstrap¬ 
ping 

Asymptotics  for  the  bootstrap  in  the  Cox  model  have  been  dealt  with  to  date  in  two 
unpublished  technical  reports.  Gu  (1991)  gives  arguments  for  the  asyu'.ptotic  validity  of 
Method  1  and  Hjort  (1985)  for  asymptotic  validity  of  Method  2.  These  authors  deal  with 
use  of  the  bootstrap  to  approximate  the  distributions  of  the  estimates  of  the  regression 
parameter  and  of  the  baseline  survival  function  (Theorem  4.2  Part  5  and  Theorem  5.2 
Part  2  of  Gu  (1991);  the  proposition  on  p.  13  of  Hjort  (1985)). 

In  the  present  paper,  we  consider  Method  3  of  bootstrapping.  Instead  of  looking  at  the 
Cox  model  in  full  generality,  we  consider  the  mathematically  simpler  situation  i;;  which 
00  is  known  to  be  zero.  This  corresponds  to  the  familiar  framework  for  which  the  Kaplan- 
Meier  estimator  (henceforth  KME)  is  the  well-known  and  well-studied  estimator  of  the 
survival  function.  We  prove  that  when  bootstrap  samples  are  taken  by  Method  3,  the 
standardized  KME  computed  from  the  bootstrap  sample  converges  weakly  to  the  same 
Gaussian  process  to  which  the  standardized  KME  itself  converges.  (In  his  unpublished 
Ph.D.  dissertation,  Kim  (1990)  gives  an  alternative  proof  of  this  result.) 

Although  the  model  that  we  consider  is  a  special  case  of  the  Cox  model,  the  arguments 
that  we  use  can  be  used  to  prove  the  analogous  results  for  the  Cox  model.  The  reason  we 
consider  the  simpler  case  here  is  that  it  is  possible  to  give  a  complete  proof  in  a  reasonable 
amount  of  space.  At  the  end  of  this  section,  we  very  briefly  indicate  how  the  arguments 
need  to  be  modified  in  order  to  deal  with  the  general  model. 

The  notation  here  is  chosen  to  be  as  similar  as  possible  to  that  used  in  the  rest  of 
the  paper.  Since  there  is  no  covariate  vector  X,,  associated  with  individual  i  are  just 
a  lifelength  Yi  and  a  censoring  time  C,;  the  observed  data  are  Ti  =  min(yi,C,)  and 
6i  =  I{Yi  <  Ci).  Of  course  the  KME’s  F  and  G  of  the  lifetime  and  censoring  distributions 
depend  on  the  sample  size  n,  as  do  the  relevant  stochastic  processes.  However,  here  we 
suppress  the  subscript  n  to  simplify  notation  and  to  be  consistent  with  the  rest  of  the 
paper. 
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Theorem  1  Assume  the  random  censorship  model  where  the  failure  time  Y  has  continu¬ 
ous  distribution  function  F  and  the  censoring  timeC  has  continuous  distribution  function 
G.  Let  H  be  the  distribution  function  of  the  observed  time,  i.e.  H  =  1  — (1  — F)(l— G).  Let 
F*  be  the  Kaplan-Meier  estimate  of  F  computed  from  the  data  resampled  by  Bootstrap 
Method  3.  Then  as  n  — »  oo 


Y_  VV  in  D[0,t]  a.s.  (5.1) 

for  any  r  <  sup{<  :  H{t)  <  1},  where  W  is  a  mean  zero  Gaussian  process  with  independent 
increments  and  variance  function  given  by 


(5.2) 


The  notation  “a.s.”  in  (5.1)  signifies  that  the  weak  convergence  result  holds  along  almost 
every  infinite  sequence  (Ti,Si),(T2,S2},. . .. 


Remark  The  assumption  that  F  and  G  be  continuous  is  actually  superfluous  (we  discuss 
this  at  the  end  of  the  proof);  we  use  the  notation  F^  and  G_  so  that  the  formulas  will 
continue  to  be  valid  if  F  and  G  are  not  continuous. 


Proof  We  rely  heavily  on  Gill  (1980),  who  establishes  weak  convergence  of  the  KME 
for  the  nonbootstrapped  case,  using  the  machinery  of  counting  processes  and  martingale 
central  limit  theorems.  A  good  reference  for  this  material  is  the  first  two  chapters  of 
Fleming  and  Harrington  (1990).  To  prove  Theorem  1,  we  apply  the  arguments  of  Gill 
(1980)  conditional  on  the  particular  infinite  sequence  {(T,,  (5^);  i  =  1,2, . . .}  (we  shall  see 
below  that  the  weak  convergence  statemen’v  in  (5.1)  actually  holds  along  every  sequence 
such  that  F  —*  F  and  G  — >  G). 

For  Bootstrap  Method  3,  we  first  resample  Y*  ~  F,  C‘  ~  Li,  where  L,  is  represented 
as 


L,{t)  =  I(8i  =  0)lit  e  [T„  oo))  +  1(6,  =  1) 


(G(0-G(r.))/(<€[r.,oo)) 
1  -  G(T.) 


(5.3) 


We  then  form  T*  =  min(y^*,  C,*)  and  6'  =  I{Y‘  <  C"),  for  i  =  1, . . . ,  n.  (For  the  sake  of 
definiteness,  G  in  (5.3)  is  taken  to  be  1  at  the  largest  observation  and  beyond,  whether 
the  6  corresponding  to  this  observation  is  0  or  1.) 

Our  situation  is  complicated  by  the  following  three  factors. 

1  The  survival  distribution  function  F  varies  with  n. 

2  The  function  F  is  discontinuous.  Thus  the  standard  counting  processes  associated  with 
the  pairs  (Ti,Si)  may  jump  at  the  same  time,  so  that  they  do  not  form  a  “multivariate 
counting  process”. 

3  The  censoring  distribution  function  L,  varies  with  i  as  well  as  with  n. 
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We  refer  to  the  standardized  version  of  F’  Sls  Z’\  that  is, 


V  1  _  F{t)  j 

(5.4) 

Let 

Ft(i)  =  F(t  A  T,;,). 

(5.5) 

where  =  max{Tj*, . . . ,  T*},  and  let 

«  1_  j. 

(5.6) 

The  proof  of  Theorem  1  proceeds  in  roughly  the  following  three  steps. 

1  We  show  that  the  process  ^  €  [0,  r]}  can  be  represented  as  the  stochastic  integral 

of  a  predictable  process  with  respect  to  a  martingale  and  is  therefore  a  martingale. 

2  We  show  that  the  martingale  Q"  satisfies  Conditions  (I)  of  Theorem  4.2.1  of  Gill  (1980), 
which  gives  weak  convergence  of  Q'  to  the  process  W  with  variance  function  given  by 
(5.2). 

3  We  show  that  the  processes  Z'  and  Q“  are  asymptotically  equivalent. 

Let  K{t)  be  the  cumulative  hazard  function  associated  with  F,  i.e. 


dPjs) 


For  i  =  1, . . . ,  n,  define  the  processes 


NHt)  =  = 

v;{t)  =  i{T:>t), 

=  [  v:is)dA{s), 
J[o,t] 

Aint)  =  N:{t)-A:{t). 
Define  further,  for  each  i,  the  filtration 


It  is  well  known  that  with  respect  to  the  filtration  M’  is  a  martingale  with  pre¬ 

dictable  variation  process 

=  (5.7) 

(cf  Theorem  1.3.1  and  Theorem  2.5.2  Part  1  of  Fleming  and  Harrington  (1990)).  Let  Ft 
be  the  join  of  F/'*  for  f  =  1, . . .  ,7i,  i.e. 


JT,  =  a((/V*(s),7V;^(s));  s  <  f;  f  =  1, . . . ,  n). 


18 


Then  since  the  pairs  (Vi,C,)  are  independent,  it  is  easy  to  see  that  with  respect  to  the 
filtration  Tx^  the  M*’s  are  still  martingales  and  their  predictable  vju-iation  processes  are 
still  given  by  (5.7)  (see  for  example  Lemma  A.l  of  Doss  and  Chiang  (1990));  moreover, 
with  respect  to  Tx^  the  M*’s  are  orthogonal  (Lemma  2.6.1.  of  Fleming  and  Harrington 
(1990)). 

We  now  proceed  to  show  that  Q*  is  a  martingale.  Define  the  processes 

f=i 

VW  =  EK‘(0.  (5.8) 

r{t)  =  nv{t)>o). 

Let  r„  be  any  number  such  that  F{Tn)  <  1.  We  then  have  the  identity 

Q-{t)  =  VS  ,(  ,  ^  i'},i  «•(«)  for  (  €  |0,r„I,  (5.9) 

1  -  F{s)  F*(5) 

which  follows  from  Equation  (3.2.13)  of  Gill  (1980).  (Here  and  throughout,  0/0  is  defined 
to  be  0.)  Because  the  integrand  in  (5.9)  is  .Frpredictable,  we  see  from  (5.9)  and  the  fact 
that  M"  is  a  martingale,  that  Q’  is  the  stochastic  integral  of  a  predictable  process  with 
respect  to  a  martingale,  and  is  therefore  a  martingale. 

We  are  now  in  a  position  to  apply  Theorem  4.2.1  of  Gill  (1980),  which  gives  conditions 
that  entail  weak  convergence  of  processes  that  are  stochastic  integrals  with  respect  to 
M’.  This  theorem  is  general  enough  to  accommodate  our  situation.  It  requires  implicitly 
that  Assumption  3.1.1  of  Gill  (1980)  hold,  i.e.  that  there  exist  a  filtration  with  respect  to 
which  the  processes  M,“’s  are  orthogonal  martingedes,  with  predictable  variation  processes 
given  by  (5.7).  We  have  just  shown  that  this  structure  holds.  To  use  the  theorem,  we 
need  to  verify  Conditions  (I)  of  Theorem  4.2.1  of  Gill  (1980).  For  the  infinite  sequence 
(Ti,^i),  (72,^2),  •  ■  •,  these  conditions  are  as  follows. 

A  F  converges  uniformly  on  [0,  r]  to  F  as  n  — +  00;  A  =  /  dF  is  finite  on  [0,  t]. 

B  There  is  a  function  h  that  is  left  continuous  with  right-hand  limits  and  is  of  bounded 

variation  on  [0,r]  such  that  (^I^)  converges  uniformly  on  [0,  t]  in  bootstrap 

probability  to  h. 

C  F*(t)  — »  00  in  bootstrap  probability  as  n  — ♦  00  for  each  t  €  [0,  r]. 

We  shall  show  that  Conditions  A,  B,  and  C  are  satisfied  for  any  infinite  sequence 
(Ti,<5i),  (72,62), . . .  for  which 

sup  |F(0- F(«)l  ^0  and  sup  \G{t)  -  G{t)\  ^  0.  (5.10) 

0<<<T  0<t<T 

We  note  that  the  set  of  such  sequences  has  probability  one  by  the  results  of  Foldes,  Rejto 
and  Winter  (1980). 

So  from  now  on  we  assume  that  the  infinite  sequence  (7i,  61),  (T2, 62),  •  •  ■  satisfies 

v 

(5.10).  Let  £^.,Var,,  P*,  and  — ^  denote  expectation,  variance,  probability,  and  conver- 
gence  in  probability  under  bootstrapping,  respectively,  i.e.  these  are  taken  conditional  on 
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the  sequence  (72,^2),  •  •  ••  The  first  part  of  Condition  A  is  then  obviously  satisfied 

and  the  second  paxt  follows  from  the  definition  of  r.  We  will  show  that  Condition  B  is 
satisfied  for  h  given  by 


h  = 


1 

I -H-' 


(5.11) 


If  F  and  G  are  continuous,  then  h  is  left  continuous  with  right  limits  and  is  of  bounded 
variation  on  [0,t]. 

We  deal  first  with  the  term  n/V*  in  Condition  B,  where  V'{t)  =  I{T‘  >  t).  Note 
that  E,{V'‘{t))  =  ]Cr=i(l  ~  ~  Li-).  We  will  need  the  following  lemma,  which  gives 

a  surprising  connection  between  the  Li's  and  G. 


Lemma  1  For  Li  given  by  (5.3)  we  have 

i  EMi)  =  (?(()■ 


Proof  of  Lemma  1  Let  T^'jj  <  <  •  •  •  <  be  the  distinct  ordered  observations. 

Without  loss  of  generality  we  can  assume  that  censored  and  uncensored  observations  are 
not  tied.  If  they  are,  censored  times  are  considered  to  have  occurred  just  after  uncensored 
times,  following  the  usual  convention.  Let  <  S'^^)  <  •  •  •  <  ^(m)  corresponding 

indicator  variables  and  let  <  •  •  •  <  be  the  number  of  ties.  Writing 

1  =  I{6i  =  0,  Ti  <  t)  +  7(6,  =  1,  Ti  <  f)  -f  I{Ti  >  f),  we  have 


=  T.  <f)  +  /(6,  =  i,  r,  +  >f)) 

”  i=i  i=i 


--f^/(5.  =  0,  T,<t) 


”  ^  G{Ti) 


(=1 


-CtiiT.  >  ()  +  f;  =  1,  r,  <  ()) 

"  'fer  fcr  G(Ti)  > 

^»=i  i.T'<t  G{Ti )  ' 


Since  G{t)  is  a  self-consistent  estimator  we  have 


G(i;-i(E/(7;>o+  E 

N=1  •:T'<fCr(7j  ^ 

See  for  example  Miller  (1981,  pp.  52-57)  for  the  definition  of  the  self-consistency  property 
and  a  proof  that  it  holds  for  the  KME.  His  proof  is  for  the  Ccise  of  no  ties  but  can  be 
ea.sily  modified  for  the  case  of  ties  with  the  above  notation.  Hence  we  have 

"fct 
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and  this  proves  the  lemma. 


From  the  lemma  we  see  that  £,(V*(t)/n)  =  1  —  H-{t)  and  Var.(V*(t)/n)  0. 

Therefore, 

-  (1  -  H-it))  0  (5.12) 

n 

for  each  t.  As  explained  on  p.  70  of  Gill  (1980),  the  results  of  van  Zuijlen  (1978)  give  that 
the  convergence  in  (5.12)  is  uniform  over  R.  By  (5.10),  H{t)  — >  H{t)  uniformly  on  [0,  t], 
and  this  implies  that 


n 


0  uniformly  for  t  G  [0,r]. 


(5.13) 


By  Theorem  4.1.1  of  Gill  (1980)  we  have  supo<«<T  l^*(0  ~  ^(01  0-  This,  together 

with  (5.10)  gives 

2 _ p*  ^  2 _ p 

- ^  uniformly  for  i  G  [0,  r].  (5.14) 

\  —  F  \  —  F 

Let  us  now  consider  J’{t)  defined  by  (5.8).  We  have 


P»{J*{t)  ^  1  for  some  t  G  [0,  r])  = 

i=l 

«=1 

<  nexp(-F_(r)(l  -  T,_(t)))  (5.15) 

i=l 

=  exp(— nF_(T)G_(T)^ 

=  exp(^—nH-{T)^ 

— >  0, 


where  in  the  above  the  third  line  follows  from  the  inequality  1  —  x  <  exp(— x)  and 
the  fourth  follows  from  Lemma  1.  This,  combined  with  (5.13)  and  (5.14),  shows  that 
Condition  B  is  satisfied  for  any  sequence  (Ti,^i),  (r2,  ^2),  •  •  •  satisfying  (5.10).  Condition 
C  is  trivially  satisfied  by  (5.12). 

We  can  now  apply  Theorem  4.2.1  of  Gill  (1980)  to  conclude  that  (5.1)  holds  if  y/n  ( ) 

is  replaced  by  Q*.  (Note  that  since  as  n  — +  00  we  have  F{t)  — >  F{t)  <  1,  the  represen¬ 
tation  (5.9)  is  valid  over  [0,t]  for  large  n.)  To  see  that  the  difference  between  Z*  and  Q’ 
is  negligible,  we  note  that 

P.{  sup  |(3-(i)  -  Z-(()|  0}=  P.{Tl„,  <  r)  ^  0 


by  (5.15).  This  concludes  the  proof  of  Theorem  1. 

For  the  case  in  which  F  and  G  have  discontinuities,  the  same  result  can  be  proved  by 
the  arguments  of  Akritas  (1986,  p.  1037).  If  we  look  carefully  at  those  arguments,  we  see 
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that  the  distributions  of  the  censoring  variables  C’  has  no  role,  so  that  the  arguments 
can  be  applied  directly  to  our  situation. 

Let  us  now  return  to  the  Cox  model,  and  suppose  that  Conditions  A-D  of  AG  are 
satisfied.  Gu  (1991)  and  Hjort  (1985)  show  that  if  yS*  and  5*(-)  are  the  estimates  of  the 
regression  parameter  and  baseline  survival  function  computed  from  the  data  resampled  by 
Method  1  or  2,  then  with  probability  one,  —  S*(-)  — *?(•))  converges  in  distribution 

to  the  same  process  to  which  the  process  —  5'(-))  converges.  Gu  deals  with 

Method  1  and  Hjort  with  Method  2.  Their  proofs  rely  heavily  on  the  machinery  developed 
in  AG.  They  both  focus  effort  on  establishing  second-order  properties,  whereas  here  we  are 
concerned  with  first-order  results  only.  To  prove  the  same  result  for  Bootstrap  Method  3 
one  proceeds  in  a  similar  way.  The  heart  of  the  proof  involves  checking  Condition  B 
(“asymptotic  stability”)  and  this  is  done  via  an  analogue  of  Lemma  1.  Unfortunately,  the 
details  needed  to  give  a  rigorous  proof  are  lengthy  and  not  straightforward,  and  this  is 
the  reason  we  have  limited  our  result  to  the  case  where  the  regression  parameter  is  known 
to  be  0. 
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Appendix 


This  appendix  contains  the  plots  referred  to  in  Section  3.3,  and  Table  1  referred  to 
in  Section  3.4.  Here  we  explain  in  some  detail  the  construction  of  the  plots.  The  plots 
summarize  the  results  of  19  (nineteen)  simulation  studies,  all  from  the  same  main  set-up: 

1  Fx,  the  covariate  distribution,  is  Uniform(0,l). 

2  F,  the  lifetime  distribution,  is  standard  Exponential. 

3  The  Cox  regression  parameter  /3o  is  2. 

4  The  form  of  the  censoring  distribution  G  is  Uniform. 

5  The  average  percent  censoring  is  55%  (The  mean  of  G  is  .25). 

The  19  studies  are  for  the  sample  sizes  n  =  30,40,50,60,70,80,90,100,  with  three 
distinct  covariate/censoring  patterns  for  the  sample  sizes  n  =  30,40,50  and  two  distinct 
covariate/censoring  patterns  for  the  sample  sizes  n  =  60, . . . ,  100. 

The  results  are  summarized  with  plots  of  coverage  probability  versus  sample  size  and 
plots  of  average  (or  median)  length  versus  sample  size.  We  found  in  studying  the  results 
that  plotting  the  mass  of  numbers  enabled  us  to  make  much  more  sense  of  them  than 
simply  scanning  them  in  a  table. 

Consider  first  the  results  we  need  to  summarize  for  and  5(-).  There  are  ten  types 
of  confidence  intervals  studied,  the  asymptotic  and  nine  types  of  bootstrap  intervals.  A 
bootstrap  interval  is  determined  by  two  factors-the  method  of  sampling  and  the  method 
of  forming  the  interval  from  a  given  sample. 

For  plotting  the  coverage  probabilities  of  the  ten  methods  in  the  nineteen  studies,  we 
would  like  to  be  able  to  compare  the  results  for  the  ten  methods  in  a  single  plot  of  coverage 
probability  versus  sample  size,  but  this  produced  too  much  overlap:  We  could  not  distin¬ 
guish  readily  the  symbols  for  the  ten  different  methods.  So,  here  we  use  three  different 
plots,  corresponding  to  the  three  methods  of  forming  the  intervals  from  a  given  sample. 
Thus  the  plot  labelled  “Percentile,”  for  instance,  contains  coverage  probabilities  for  three 
bootstrap  methods:  Percentile/ Method  1,  Percentile/Method  2,  and  Percentile/Method3. 
The  coverage  probabilities  for  the  .Asymptotic  method  are  included  on  each  of  the  three 
plots,  so  that  this  method  can  easily  be  compared  to  any  bootstrap  method  and  also  as 
a  reference  to  enable  easier  comparison  of  the  results  on  the  three  different  plots.  The 
horizontal  line  at  coverage  probability  .90  also  enables  easier  comparison  among  the  three 
plots.  Two  additional  reference  lines  are  drawn  at  coverage  .90  ±  2(.90  x  .10/2000)*^^.  If 
a  method  ha.s  exact  coverage  .90  then  roughly  95%  of  the  observed  coverages  should  lie 
within  the  band  formed  by  these  lines. 

With  only  four  types  of  points  to  follow  in  one  plot,  there  is  relatively  little  problem 
with  overlap.  Here  we  are  adhering  to  graphical  principles  put  forth  in  Cleveland  (1985. 
Ch.  3). 

The  horizontal  axis  in  these  plots  is  labelled  sample  size.  However,  at  each  sample 
size  we  had  either  three  or  two  separate  studies.  We  wanted  to  be  able  to  distinguish 
the  results  of  the  different  studies.  Therefore,  for  the  first  covariate/censoring  pattern  at 
n  —  30,  we  have  plotted  the  coverage  probabilities  for  the  four  methods  exactly  vertically 
aligned  above  the  sample  size  n  =  30;  for  the  second  covariate/censoring  pattern  at 


n  =  30,  we  have  plotted  the  coverage  probabilities  for  the  four  methods  exactly  vertically 
aligned,  slightly  to  the  right  of  n  =  30,  and  so  forth.  We  have  used  lines  to  connect 
symbols  for  the  same  method  in  different  studies,  both  to  enable  the  reader  to  more  easily 
see  the  trend  with  sample  size,  and  to  aid  the  reader  in  identifying  all  points  from  the 
same  study.  That  is,  for  example,  without  the  lines  it  may  be  difficult  to  judge  which 
points  come  from  the  second  covariate/censoring  pattern  for  n  =  30.  So  this  plot  is  a 
connected  symbol  graph,  a  kind  of  graph  which  is  often  used  in  time  series  but  which  has 
other  uses  as  well;  see  Cleveland  (1985,  p.  181  and  pp.  188-189). 

The  lengths  of  the  percentile  and  hybrid  intervals  are  the  same,  so  there  are  only  two 
plots  of  average  length  for  and  S{-).  Since  bootstrap-t  intervals  were  not  attempted  for 
^i/2(')?  there  are  only  two  plots  of  coverage  probability  for  ^i/2(‘)  s^nd  one  plot  of  median 
length.  For  each  of  the  five  parameters  studied,  there  are  two  figures,  one  for  coverage 
probability  and  one  for  average  length.  The  only  exception  to  this  is  that  the  tw'o  plots 
of  median  length  for  the  two  values  of  x  at  which  ^1/2(2?)  is  studied  are  put  into  a  single 
figure.  Figure  9. 
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Figure  1 :  Coverage  Probability  vs.  Sample  Size 

for  Beta 
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Figure  2:  Average  Length  vs.  Sample  Size 

for  Beta 
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Figure  3:  Coverage  Probability  vs.  Sample  Size 

for  S(.106) 

1  •  Method  1,2-  Method  2, 3  -  Method  3,  a  -  Asymptotic 
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Figure  4:  Average  Length  vs.  Sample  Size 
for  S(.106) 
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Figure  5:  Coverage  Probability  vs.  Sample  Size 

for  S(.255) 

1  -  Method  1,2-  Method  2, 3  -  Method  3,  a  -  Asymptotic 
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Figure  6:  Average  Length  vs.  Sample  Size 
for  S(.255) 
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Figure  7:  Coverage  Probability  vs.  Sample  Size 
for  Median  Survival  at  X  =  .5 

1  -  Method  1,2-  Method  2, 3  -  Method  3.  a  -  Asymptotic 
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Figure  8:  Coverage  Probability  vs.  Sample  Size 
for  Median  Survival  at  X  =  .939 
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Figure  9;  Median  Length  of  Percentile  and  Hybrid  Intervals  vs.  Sample  Size 

for  Median  Survival  at  Two  X  Values 


1  »  Method  1.2*  Method  2. 3  «  Method  3,  a  »  Asymptotic 

X=.5 


30 


40 


50  60  70 


80 


90 


100 


X=.939 


30  40  so  60  70  80  90  100 


Table  1.  The  Study  with  an  Influential  Point:  Coverage  Probabilities  and  Average  or 
Median  Length  of  Confidence  Intervals. 


/? 

S(.IO) 

5(.25) 

a.54) 

^(.99) 

Cov. 

Ave. 

Cov. 

Ave. 

Cov. 

Ave. 

Cov. 

Med. 

Cov. 

Med. 

Pr. 

Len. 

Pr. 

Len. 

Pr. 

Len. 

Pr. 

Len. 

Pr. 

Len. 

PI 

.88 

2.80 

.88 

.253 

.89 

.342 

.89 

.262 

.89 

.145 

2 

.85 

2.32 

.88 

.250 

.89 

.327 

.89 

.256 

.86 

.139 

3 

.85 

2.28 

.90 

.262 

.90 

.330 

.90 

.251 

.87 

.139 

HI 

.88 

2.80 

.84 

.253 

.89 

.342 

.77 

.262 

.78 

.145 

2 

CO 

CO 

2.32 

.84 

.250 

.87 

.327 

.76 

.256 

.77 

.139 

3 

.80 

2.28 

.84 

.262 

.86 

.330 

.75 

.251 

.76 

.139 

T1 

.81 

2.29 

.95 

.301 

.97 

.407 

2 

.79 

2.25 

.95 

.293 

.96 

.392 

3 

.78 

2.24 

.95 

.293 

.95 

.381 

A 

.84 

2.35 

.86 

.246 

.85 

.298 

.86 

.232 

.92 

.150 

Note:  P  =  percentile  intervals,  H  =  hybrid  intervals,  T  =  boot-t  intervals,  A  =  asymptotic 
intervals.  1,  2,  3  refer  to  Methods  1,  2,  and  3,  respectively,  of  forming  the  bootstrap 
samples. 
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